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CHARACTERIZATION OF THE YEAST TRANSCRIPTOME 



irXT. FIELD OF THE INVENTION 
This invention is rdated to the characterization of the expressed genes 
. of the yeast graome. More particularly, it is related to the identification and 
S use of previously unrecognized genes. 

B ACKGROUN n OF THE INVENTION 

It is by now aromatic that the phenotype of an organism is largely 
determined by the genes expressed witlun it. These expressed genes can be 
represented by a *'transcriptome*', conveying the identity of each expressed 
10 gene and its level of expression for a defined population of cells. Unlike the 

genome, which is essentially a static entity, the transcriptome can be 
modulated by both external and internal factors. The transcriptome thereby 
serves as a dynamic link between an organism's genome and its physical 
characteristics. 

15 The transcriptome as defined above has not been characterized in any 

eukaryotic or prokaryotic organism, largely because of technological 
limitations. However, some general features of gene expression patterns 
were elucidated two decades ago through KNA-DNA hybridization 
measurements (Bishop et al., 1974; Hereford and Rosbash, 1977). In many 

20 orgamsms, it was thus found that at least three classes of transcripts could be 

identified, with either high, medium, or low levels of expression, and the 



1 



numba- of transcripts per cell were estimated (Lewin, 1980). These data of 
course provided Htde information about the specific genes that were members 
of each class. Data on the expression levels of individual genes have 
accumulated as new genes were discovered. However, in only a few 
instances have the absolute levels of expression of particular genes been 
measured and compared to other genes in the same cell type. 

Description of any cell's transcriptome would therefore provide new 
information useful for understanding numerous aspects of cell biology and 
biochemistry. 

CTTMMARY OF THE INVENTIQH 

It is an object of the present invention to provide genes which are 
involved in cell cycle progression. 

It is another objea of the present invention to provide methods of using 
the genes to affect the cell cycle. 

It is an object of the present invention to provide methods for screeiung 
candidate antifungal drugs. 

Another object of the invention is to provide a method for obtaining 
human homologs of the yeast genes which are involved in cell cycle 
progression. 

Another object of the invention is to provide probes for ascertmning 
phase in the cell cycle of a cell. 

These and other objects of the inventipn are achieved by providing the 
art with one or more of the ^bodiments described below. According to one 
embodiment of the invention an isolated DNA molecule is provided. It 
comprises a yeast gene which is involved in cell cycle progression selected 
from the group of NORF genes identified in Table 3 or 4. 

According to another embodiment of the invention a method of using 
yeast genes is provided. The method is for affecting the cell cycle of a cell. 
The method comprises the step of: 

administering to a cell an isolated DNA molecule comprising a 



yeast gene which is involved in cell cycle progression selected from the 
differentially expressed genes identified in Tables 1, 2, 3 and 4, 

In yet another embodiment of the invention a method for screening 
candidate antifungal drugs is provided. The method comprises the steps of: 
contacting a test substance with a yeast cell; 
monitoring expression of a yeast gene which is involved in cell 
cycle progression selected from the group of yeast genes identified in Tables 
1, 2, 3 and 4, wherein a test substance which modifies the expression of the 
yeast gene is a candidate antifungal drug. 

In still another embodiment of the invention a method for identifjdng 
human genes which are involved in cell cycle progression is provided. The 
method comprises the step of: 

hybridizing a probe comprising at least 14 contiguous 
nucleotides of a yeast gene which is differentially expressed between at least 
two phases selected from the group consisting of log phase, S phase, and 
G2/M phase, wherein the yeast gene is identified in Table 1, 2, 3, or 4. 

Also provided by the present invention are isolated DNA molecules, 
which comprise probes for ascertaining phase in the cell cycle of a cell, 
wherein the probe comprises at least 14 contiguous nucleotides of a NORF 
gene as identified in Table 3 or 4. 

These and other embodiments of the invention which will be apparent 
to those of skill in the art upon reading the detailed disclosure provided 
below, make available to the art hitherto unrecognized genes, and information 
about the expression of genes globally at the organismal level. We provide 
the first description of a transcriptome, determined in S. cerevisiae cells. This 
organism was chosen because it is widely used to clarify the biochemical and 
physiologic parameters underlying eukaryotic cellular fianctions and because 
it is the only wikaryote in which the entire genome has been defined at the 
nucleotide level (Goffeau, et al., 1996). 



ffP TFT HVSCB TPTTON OF THE DRAWINGS 
Figure 1 . Schematic of SAGE Method and Genome Analysis. 
In applying SAGE to the analysis of yeast gene expression patterns, the 3* 
most NlalQ site was used to define a unique position in each transcript and 
to provide a ate for ligation of a linker with a BsniFI site. The type lis 
enzyme BsmFI, which cleaves a defined distance fi-om its non-palindronuc 
recognition site, was then used to generate a 15bp SAGE tag (designated by 
the black arrows), which includes the Nlalll site. Automated sequencing of 
concatenated SAGE tags allowed the routine identification of about a 
thousand tags per sequencing gel. Once sequenced, the abundance of each 
SAGE tag was calculated, and each tag was used to search the entire yeast 
genome to identify its corresponding gene. The lower panel shows a small 
region of Chromosome 15. Gray arrows indicate all potential SAGE tags 
(Nlam sites) and black arrows indicate 3' most SAGE tags. The total number 
of tags observed for each potential tag is indicated above (+ strand) or below 
(. strand) the tag. As »cpected, the observed SAGE tags were associated 
wth the 3* end of expressed genes. 

Figure 2. Sampling of Yeast Gene Expression. 

Analysis of increasing amounts of ascertMned tags reveals a plateau in the 
number of unique expressed genes. Triangles represent genes with knovwi 
functions, squares represent genes predicted on the basis of sequence 
information, and circles represent total genes. 

Figures. Virtual Rot. 

(a) Abundance Classes in the Yeast Transcriptome. The transcript abundance 
is plotted in reverse order on the abscissa, whereas the firaction of total 
transcripts with at least that abundance is plotted on the ordinate. The dotted 
lines identify the three components of the curve, 1, 2, and 3. This is 
analogous to a Rot curve derived from reassociation kinetics where the 
product of iratial RNA concentration and time is plotted on the abscissa, and 
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the percent of labeled cDNA that hybridizes to excess mKNA is plotted on 
the ordinate. 

(b) Comparison ofVirtuai Rot and Rot Components. Transitions and data 
from virtual Rot components were calculated from the data in Figure 3A, 
while data for Rot components were obtjuned from Hereford and Rosbash, 
1977. 

Figure 4. Chromosomal Expression Map for S. cerevisiae. Individual yeast 
genes were positioned on each chromosome according to their open reading 
frame (ORF) start coordinates. Abundance levels of tags corresponding to 
each gene are displayed on the vertical axis, with transcription from the + 
strand indicated above the abscissa and that from the - strand indicated below. 
Yellow bands at ends of the expanded chromosome represent telomeric 
regions that are undertranscribed (see text for details). 

Figure 5. Northern Blot Analysis of Representative Genes. TDH2/3, 
TEFl/2 andNORFl, are expressed relatively equally in all three states (lane 
1, G2/M arrested; lane 2, S phase arrested; lane 3, log phase), while RNR4, 
RNR2 , and NORF5 are highly expressed in S-phase arrested cells. The 
expression level observed by SAGE (number of tags) is noted below each 
lane and was highly correlated with quantitation of the Northern blot by 
Phosphorlmager analysis (r^O.97). 



Table Legends 



Table 1. Highly Expressed Genes 

Tag represents the 10 bp SAGE tag adjacent to the Nlalll site; Gene 
represents the gene or genes corresponding to a particular tag (multiple genes 
that match unique tags are from related families, with an average identity of 
93%); Locus and Description denote the locus name, and functional 
description of each ORF, respectively; Copies/cell represents the abundance 
of each transcript in the SAGE library, assuming 15,000 total transcripts per 
cell and iS0,633 ascertained transcripts. 

Table 2. Expression of Putative Coding Sequences 
Table, columns are the same as for Table L 

Table 3. Expression of NORF genes 

SAGE Tag, Locus, and Copies/cell are the same as for Table 1 ; Chr and Tag 
Pos denote the chromosome and position of each tag; ORF Size denotes the 
size of the ORF corresponding to the indicated tag. In each case, the tag was 
located within or less than 250 bp 3' of tiie NORF. 

DETATI^^P py^scMmoN 

It is a discovery of the present invention that certain hitherto unknown 
genes (the NORFs) exist and are expressed in yeast. These genes, as well as 
other previously identified and pre>aously postulated genes, can be used to 
study, monitor, and affect phase of cell cycle. The present invention provides 
information on which genes are differentially expressed during the cell cycle. 
Differentially expressed genes can be used as markers of phases of the cell 
cycle. They can also be used to affect a change in the phase of the cell cycle. 
In addition, they can be used to screen for drugs which affect the cell cycle, 
by affecting expression of the genes. Human homologs of these eukaryotic 
genes are also presumed to exist, and can be identified using the yeast genes 
as probes or primers to identify the human homologs. 



New genes termed NORFs (not previously assigned open reading 
frames) have been found. They are uniquely identified by their SAGE tags. 
In addition their entire nucleotide sequence is known and publicly available. 
In general, these were not previously identified as genes due to their small 
size. However, they have now been found to be expressed. 

DiflFerentially expressed yeast genes are those whose expression varies 
by a statistically agraficant difference (to greater than 95% confidence level) 
within different growth phases, particularly log phase, S phase, and G2/M. 
Preferably the difference is greater than 10%, 25%, 50%, or 100%. The 
genes which have been found to have such differential expression 
characteristics are: NORF N« 1, 2, 4, 5, 6, 17, 25, 27, TEF1/TEF2, EN02, 
ADHl, ADH2, PGKl, CUPIA/CUPIB, PYKl. YKL056C, YMR116C, 
YEL033W, YOR182C, YCR013C, ribonucleotide reductase 2 and 4, and 
YJR085C. 

The DNA molecules according to the invention can be genomic or 
cDNA. Preferably they are isolated fi-ee of other cellular components such 
as membrane components, proteins, and lipids. They can be made by a cell 
and isolated, or synthesized using PCR or an automatic synthesizer. Any 
technique for obtaining a DNA of known sequence may be used. Methods 
for purifjdng and isolating DNA are routine and are known in the art. 

To administer yeast genes to cells, any DNA delivery techniques known 
in the art may be used, without limitation. These include liposomes, 
transfection, transduction, transformation, viral infection, electroporation. 
Vectors for particular purposes and characteristics can be selected by the 
skilled artisan for their known properties. Cells which can be used as gene 
recipients are yeast and other fungi, mammalian cells, including humans, and 
bacterial cells. 

Antifungal drugs can be identified using yeast cells as described herein. 
Expression of a diflferOTtially expressed gene can be monitored by any means 
known in the art. When a test substance affects the expression of such a 
differentially expressed gene, it is a candidate drug for affecting the growth 



properties of fungi, and may be useful as an antifungal agent. 

Because difFerentially expressed genes are likely to be involved in cell 
cycle progression, it is likely that these genes are conserved among species. 
The difFerentially expressed genes identified by the present invention can be 
used to identify homologs in humans and other manunals. Means for 
identifying homologous genes among different species are well known in the 
art. Briefly, stringency of hybridization can be reduced so that imperfectly 
matching sequences hybridize. This can be in the context of inter alia 
Southern blots, Northern blots, colony hybridization or PGR. Any 
hybridization technique which is known in the art can be used. 

Probes according to the present invention are isolated DNA molecules 
which have at least 10, and preferably at least 12, 14, 16, 18, 20, or 25 
contiguous nucleotides of a particular NORF gene or other differentially 
expressed gene. The probes may or may not be labeled. They may be used 
as primes for PGR or for Southern or Northern blots. Preferably the probes 
are anchored to a solid support. More preferably they are present on an array 
so that multiple probes can simultaneously hybridize to a single biological 
sample. The probes can be spotted onto the array or synthesized in situ on 
the array. See Lockhart et. al., Nature Biotechnology, Vol. 14, December 
1996, ''Expression monitoring by hybridization to high-density 
oligonucleotide arrays." A single array can contain more than 100, 500 or 
even 1,000 diflFerent probes in discrete locations. 

The above disclosure generally describes the present invention. A more 
complete understanding can be obtained by reference to the following specific 
examples which are provided herein for purposes of illustration only, and are 
not intended to limit the scope of the invention. 

. Summary 

We have analyzed the set of genes expressed from the yeast genome, herein 
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called the transcriptome, using serial analysis of gene expression (SAGE). 
Analysis of 60,633 transcripts revealed 4,665 genes, with expression levels 
ran^g from 0.3 to over 200 transcripts per cell Of these genes, 1,981 had 
known functions, while 2,684 were previously uncharacterized. Integration 

5 of positional information with gene expression data allowed the generation 

of chromosomal expression maps, identi^dng physical regions of 
transcriptional activity, and identified genes that had not been predicted by 
sequence information alone. These studies provide insight into global 
patterns of gene expression in yeast and demonstrate the feasibility of 

10 genome-wide expression studies in eukaryotes. 

Results 

Characteristics and Rationale of SAGE Approach 

Several methods have recently been described for the high throughput 

evaluation of gene expression (Nguyen et al., 1995; Schena et al., 1995; 

15 Velculescu et al., 1995). We used SAGE (Serial Analysis of Gene 

Expres^on) because it can provide quantitative gene expression data without 
the prerequisite of a hybridization probe for each transcript. The SAGE 
technology is based on two basic principles (Figure 1). First, a short 
sequence tag (9-1 1 bp) contains suffident information to uniquely identify a 

20 transcript, provided that it is derived from a defined location within that 

transcript. Second, many transcript tags can be concatenated into a single 
molecule and then sequenced, revealing the identity of multiple tags 
simultaneously. The expression pattern of any population of transcripts can 
be quantitatively evaluated by determining the abundance of individual tags 

25 and identifying the gene corresponding to each tag. 

Genome-wide expression 

In order to maximize representation of genes involved in normal growth and 
cell-cycle progression, SAGE libraries were generated from yeast cells in 
three states: log phase, S phase arrested and G2/M phase arrested. In total. 
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SAGE tags corresponding to 60,633 total transcripts were identified 
(including 20,184 from log phase, 20,034 from S phase arrested, and 20,415 
from G2/M phase arrested ceUs). Of these tags, 56,291 tags (93%) precisely 
matched the yeast genome, 88 tags matched the mitochondrial genome, and 
91 tags matched the 2 micron plasmid. 

The number of SAGE tags required to define a yeast transcriptome 
depends on the confidence level desired for detecting low abundance mRNA 
molecules. Assuming the previously derived estimate of 15,000 mRNA 
molecules per cell (Hereford and Rosbash, 1977), 20,000 tags would 
represent a 1.3 fold coverage even for mRNA molecules present at a single 
copy per cell, and would provide a 72% probability of detecting such 
transoipts (as determined by Monte Carlo simulations). Analysis of 20,184 
tags from log phase cells identified 3,298 unique genes. As an independent 
confirmation of mRNA copy number per cell, we compared the expression 
level o{ SUP44/RPS4, one of the few genes whose absolute mRNA levels 
have been reliably determined by quantitative hybridization experiments (Iyer 
and Stmhl, 1996), with expression levels determined by SAGE. 
SUP44/RPS4 was measured by hybridization at 75 +/- 10 copies/cell (Iyer 
and Struhl, 1996), in good accord wth the SAGE data of 63 copies/cell, 
suggesting that the estimate of 15,000 mRNA molecules per cell was 
reasonably accurate. Analyris of SAGE tags from S phase arrested and G2/M 
phase arrested cells revealed sinular expression levels for this gene (range 52 
to 55 copies/cell), as well as for the vast majority of expressed genes. As less 
than 1% of the genes were expressed at dramatically different levels among 
these three states (see below), SAGE tags obtained from all libraries were 
combined and used to analyze global patterns of gene expression. 

Analysis of ascertained tags at increasing increments revealed that the 
number of unique transcripts plateaued at --60,000 tags (Figure 2). This 
suggested that generation of fiirther SAGE tags would yield few additional 
genes, consistent with the fact that sixty thousand transcripts represented a 
four-fold redundancy for genes expressed as low as 1 transcript per cell. 
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Likewise, Monte Carlo simulations indicated that analysis of 60,000 tags 
would identify at least one tag for a ^ven transcript 97% of the time if its 
expression level was one copy per cell. 

The 56,291 tags that precisely matched the yeast genome represented 
4,665 difiFerent genes. This number is in agreement with the estimate of 
3,000 to 4,000 expressed genes obtained by KNA-DNA reassociation kinetics 
Hereford and Rosbash, 1977). These expressed genes included 85% of the 
genes with characterized functions (1,981 of 2,340), and 76% of the total 
genes predicted from analysis of the yeast genome (4,665 of 6,121). These 
numbers are consistent with a relatively complete sampling of the yeast 
transcriptome given the limited number of physiological states examined and 
the large number of gmes predicted solely on the basis of genomic sequence 
analysis. 

The transcript expression per gene was observed to vary from 0.3 to 
over 200 copies per cell. Analysis of the distribution of gene expression 
levels revealed several abundance classes that were similar to those observed 
in previous studies using reassociation kinetics. A "virtual Rot" of the genes 
observed by SAGE (Figure 3 A) identified three main components of the 
transcriptome with abundances ranging over three orders of magnitude. A 
Rot curve derived from RNA-cDNA reassociation kinetics also contained 
three main components distributed over a similar range of abundances 
(Hereford and Rosbash, 1977). Although the kinetics of reassociation of a 
particular class of RNA and cDNA may be affected by numerous 
experimental variables, there were striking similarities between Rot and 
virtual Rot analyses (Figure 3B). Because Rot analysis may not detect all 
transcripts of low abundance (Lewin, 1980), it is not surprising that SAGE 
revealed both a larger total number of expressed genes and a higher fraction 
of the transcriptome belonging to the low abundance transcript class. 

Integration of Expression loformation with the Genomic Map 

The SAGE expression data could be integrated with existing positional 
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information to generate chromosomal expression maps (Figure 4). These 
maps were generated using the sequence of the yeast genome and the position 
coordinates of ORFs obtained from the Stanford Yeast G«»ome Database. 
Although there were a few genes that wca-e noted to be physically prowmal 
and have ^milariy high levels of expression, there did not appear to be any 
clusters of particularly high or low expression on any chromosome. Genes 
like histones ffi and H4, which are known to have coregulated divergent 
promoters and are immediately adjacent on chromosome 14 (Snuth and 
Murray, 1983), had very similar expression levds (5 and 6 copies per cell, 
respectively). The distribution of transcripts among the chromosomes 
suggested that overall transaiption was evenly dispersed, with total transcript 
levds being roughly linearly related to chromosome size (r^ =0.85, data not 
shown). However, re^ons within 10 kb of telomeres appeared to be 
uniformly undertransoribed, containing on average 3.2 tags per gene as 
compared vnth 12.4 tags per gene for non-telomeric regions (Figure 4). This 
is consistent with the previously described observations of "telomeric 
sUencing" in yeast (Gottschling et al., 1990). Recent studies have reported 
telomeric position effects as far as 4 kb from telomere ends (Renauld et al., 
1993). 

Gene Expression Patterns 

Table 1 Usts the 30 most highly expressed genes, all of which are expressed 
at greater than 60 mRNA copies per cell. As expected, these genes mostly 
correspond to well characterized enzymes involved in energy metabolism and 
protein synthesis and were expressed at similar levels in all three growth 
states (Examples in Figure 5). Some of these genes, including EN02 
(McAlister and HoUand, 1982), PDCl (Schmitt et al., 1983), PGKl 
(Chambers et al., 1989), PYKl (Nishizawa et al., 1989), and ADHl (Denis et 
al., 1983), are known to be dramatically induced in the glucose-rich growth 
conditions used in this study. In contrast, glucose repressible genes such as 
the GALJJGAL7/GAL10 chister (St John and Davis, 1979), and GAL3 (Bajwa 
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et al., 1988) were observed to be expressed at very low levels (0.3 or fewer 
copies per cell). As expected for the yeast strain used in this study, mating 
type a specific genes, such as the a factor genes {MFAl, MFA2) (Michaelis 
and Herskowitz, 1988), and alpha factor receptor {STE2) (Burkholder and 
Hartwell, 1985) were all obswved to be expressed at significant levels (range 
2 to 10 copies per cell), while mating type alpha specific genes QAFal, 
MFcQ, STE3) (Hagen et a!., 1986; Kuijan and Herskowitz, 1982; Singh et al., 
1983) were observed to be expressed at very low levels (<0.3 copies/cell). 

Three of the highly expressed genes in Table 1 had not been previously 
characterized. One contained an ORF with predicted ribosomal fiinction, 
previously identified only by genomic sequence analysis. Analyses of all 
SAGE data suggested that there were 2,684 such genes corresponding to 
uncharacterized ORFs which were transcribed at detectable levels. The 30 
most abundant of these transcripts were observed more than 30 times, 
corresponding to at least 8 transcripts per cell (Table 2). The other two 
highly expressed uncharacterized genes corresponded to ORFs not predicted 
by analysis of the yeast genome sequence (NOKF = Uonannotated QKF). 
Analyses of SAGE data suggested that there were approximately 160 NOEF 
genes transcribed at detectable levels. The 30 most abundant of these 
transcripts were observed at least 9 times (Table 3 and examples in Figure 5). 

Interestingly, one of the NORF genes (N0RF5) was only expressed in 
S phase arrested cells and corresponded to the transcript whose abundance 
varied the most in the three states analyzed (> 49 fold. Figure 5), 
Comparison of S phase arrested cells to the other states also identified greater 
than 9 fold elevation of the RNR2 and RNR4 transcripts (Figure 5). Induction 
of these ribonucleoside reductase genes is likely to be due to the hydroxyurea 
treatment used to arrest cells in S phase (Elledge and Davis, 1989). 
Likewise, comparison of G2/M arrested cells identified elevation of RBL2 
and dynein light chain, both microtubule associated proteins (Archer et al., 
1995; Dick et al.. 1996). As with the KNR inductions, these elevated levels 
seem likely to be related to the nocodazole treatment used to arrest cells in 
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the G2/M phase. While there were many relatively small differences between 
the states (for example, NORFl, Figure 5), overall comparison of the three 
states revealed surprisingly few dramatic differences; there were only 29 
transcripts whose abundance varied more than 10 fold among the three 
different states analyzed. 

Discussion 

Analysis of a yeast transcriptome affords a unique view of the RNA 
components defining ceUular life. We observed gene expression levels to vary 
over three orders of magnitude, with the transcripts involved in energy 
metabolism and protein synthesis the most highly expressed. Key transcripts, 
such as those encoding enzymes required for DNA replication (e.g. POLl and 
POLS), kinetochore proteins {NDCIO and SKP1\ and many other interesting 
proteins, were present at 1 or fewer copies per cell on average. These 
abundances are consistent with previous qualitative data fi^om reassociation 
kinetics wWch suggested that the largest number of expressed genes was 
present at 1 or 2 copies per cell. These observations indicate that low 
transcript copy numbers are suflBcient for gene expression in yeast, and 
suggest that yeast possess a mechanism for rigid control of RNA abundance. 

The synthesis of chromosomal expression maps presents a cataloging 
of the expression level of genes, organized by their genomic positions. It is 
not surprising that gene expression is well balanced throughout the 16 
chromosomes of S. cerevisiae. As most genes have independent regulatory 
elements, it would have been surprising to find a large number of physically 
adjacent genes that had similar high levels of expression. Of the few genes 
that were knov^ to have coregulated divergent promoters, like the H3/H4 
pair, SAGE data confirmed concordant levels of expression. For areas like 
tdomo^ ends that are known to be transcriptionally suppressed, SAGE data 
corroborated low levels of expression. Other expected expression patterns 
such as high levels of glucose induced glycolytic enzymes, low levels of 
glucose repressed GAL genes, expresaon of mating type a specific genes, and 



14 



low of expression of mating type alpha genes, were observed. Finally, 
identification of tags corresponding to NORF genes suggests that there is a 
significant number of small proteins encoded by the yeast genome that were 
undetected by the criteria used for systematic sequence analysis. The yeast 
genome sequence has been annotated for all ORFS larger than 300bp, 
(encoding proteins 100 amino acids or greater). Genes encoding proteins 
bdow tWs cut off are therefore commonly unannotated. This class of genes 
might also be underrepresented in mutational collections because of the small 
target size for mutagenesis, and given their small size, may encode proteins 
with novel functions. The systematic knockout of these NORF genes will 
therefore be of great interest. 

Comparison of gene expression patterns from altered physiologic states 
can provide insight into genes that are important in a variety of processes. 
Comparison of transcriptomes from a variety of physiologic states should 
provide a minimum set of genes whose expression is required for normal 
vegetative growth, and another set composed of genes that will be expressed 
only in response to specific environmental stimuli, or during specialized 
processes. For example, recent work has defined a minimal set of 250 genes 
required for prokaryotic cellular life (Mushe^an and Koonin, 1996). 
Examination of the yeast genome readily identified homologous genes for 196 
of these, over 90% of which were observed to be expressed in the SAGE 
analysis. Detailed analyses of yeast transcriptomes, as well as transcriptomes 
from other organisms, should ultimately allow the generation of a minimal set 
of genes required for eukaryotic life. 

Like other genome-wide analyses, SAGE analysis of yeast 
transcriptomes has several potential limitations. First, a small number of 
transcripts would be expected to lack an Nlalll site and therefore would not 
be detected by our analysis. Second, our analysis was limited to transcripts 
found at least as frequently as 0.3 copies per cell. Transcripts expressed in 
only a minute fraction of the cell cycle, or transcripts expressed in only a 
fiaction of the cell population, would not be reliably detected by our analysis. 
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Finally, mRNA sequence data are practically unavailable for yeast, and 
consequently, some SAGE tags cannot be unambiguously matched to 
corresponding genes. Tags which were derived from overlapping genes, or 
genes which have unusually long 3* untranslated regions may be misassigned. 
Increased availability of 3* UTR sequences in yeast mRNA molecules should 
help to resolve the ambiguities. 

Despite these potential limitations^ it is clear that the analyses described 
here furnish both global and local pictures of gene expression, precisely 
defined at the nucleotide level. These data, like the sequence of the yeast 
genome itsell^ provide simple, basic information integral to the interpretation 
of many experiments in the future. The availability of mRNA sequence 
information from EST sequencing as well as various genome projects, will 
soon allow definition of transcriptomes from a variety of organisms, including 
human. The data recorded here suggest that a reasonably complete picture 
of a human cell transcriptome will require only about 10-20 fold more tags 
than evaluated here, a number well wthin the practical realm achievable vnXh 
a small number of automated sequencers. The analysis of global expression 
patterns in higher eukaryotes is expected, in general, to be similar to those 
reported here for S. cerevisiae. However, the analysis of the transcriptome 
in different cells and from different individuals should yield a wealth of 
information regarding gene function in normal, developmental, and disease 
states. 

Experimental Procedures 
Yeast cell culture 

The source of transcripts for all experiments vsras S. cerevisiae strain YPH499 
{MATa ura3'52 fys2-801 ade2-101 leul-Al Ms3-A200 trpJ-A63) (Sikorski 
andKeter, 1989). Logarithmically growing cells were obtained by growing 
yeast cells to early log phase (3 x 10^ cells/ml) in YPD (Rose et al, 1990) 
rich medium (YPD supplemented vnth 6mM uracil, .4.8 mM adenine and 24 
mM tryptophan) at 30*'C. For arrest in the Gl/S phase of the cell cycle. 
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hydroxyurea (O.IM) was added to early log phase cells, and the culture was 
incubated an additional 3.5 hours at 30**C. For arrest in the G2/M phase of 
the cell cycle, nocoda2X>le (ISug/ml) was added to early log phase cells and 
the culture was incubated for an additional 100 minutes at 30**C. Harvested 
cells were washed once with water prior to freeang at -70"C. The growth 
states of the harvested cells were confirmed by microscopic and flow 
cytometric analyses (Basrai et al., 1996). 

RNA isolation and Northern Blot Analysis 

Total yeast RNA was prepared using the hot phenol method as described 
(Leeds et al., 1991). mRNA was obtained using the MessageMaker Kit 
(Gibco/BRL) following the manufacturer's protocol. Northern blot analysis 
was performed as described (El-Deiry et al., 1993), using probes PGR 
amplified firom yeast genomic DN A 

SAGE protocol 

The SAGE method was performed as previously described (Velculescu et al., 
1995), with exceptions noted below. PolyA RNA was converted to double- 
stranded cDNA with a BRL synthesis kit using the manufacturer's protocol 
except for the inclusion of primer biotin-5 -Tig-3'. The cDNA was cleaved 
with Nlam (Anchoring Enzyme). As Nlalll sites were observed to occur 
once every 309 base pairs in three arbitrarily chosen yeast chromosomes (1, 
5, 10), 95% of yeast transcripts were predicted to be detectable with a NlalH- 
based SAGE approach. After capture of the 3' cDNA fragments on 
streptavidin coated magnetic beads (Dynal), the bound cDNA was divided 
into two pools, and one of the following linkers containing recognition sites 
for BsmFI was ligated to each pool: Linker 1, 5*- 
lTrGGATrrGCTGGTGCAGTACAACTAGGCrrAATAGGGACATG-3' 
(SED ID N0:l) .5» - 

TCCCTATTAAGCCTAGTTGTACTGCACCAGCAAATCC 
[amino mod. C7].3'(SED ID N0:2).; Linker 2,5'- 
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TTTCTGCTCGAATTCAAGCTTCTAACGATGTACGGGGACATG-3^ 

(SED ID NO: 3)5'- 

TCCCCGTACATCGTrAGAAGCTTGAATTCGAGCAG[arninoinod, OTI- 
S' (SEDIDNO:4). 

As BsmFI (Tagging Enzyme) cleaves 14 bp away from its recognition 
site, and the NlaBI site overlaps the BsmFI site by 1 bp, a 1 5 bp SAGE tag 
was released with BsmFI. SAGE tag overhangs were fiUed-in with Klenow, 
and tags from the two pools were combined and ligated to each other. The 
ligation product was diluted and then amplified with PGR for 28 cydes with 
5*-GGATTTGCTGGTGCAGTACA-3' (SED ID N0:5) and 5'- 
CTGCTCGAATTCAAGCTTCT-3* (SED ID NO:6), as primers. The PGR 
product was analyzed by polyacrylamide gel electrophoresis (PAGE), and the 
PGR product contairang two tags ligated tail to tail (ditag) was excised. The 
PGR product was then cleaved with Nlaffl, and the band containing the ditags 
was excised and self-ligated. After ligation, the concatenated products were 
separated by PAGE and products between 500 bp and 2 kb were excised. 
These products were cloned into the SphI site of pZero (In^atrogen). 
Colonies were screened for inserts by PGR vnih M13 forward and M13 
reverse sequences located outside the cloning site as primers. 

PGR products from selected clones were sequenced vnih the TaqFS 
DyePrimer kits (Perkin Elmer) and analyzed using a 377 ABI automated 
sequencer (Perkin Elmer), following the manufacturer's protocol. Each 
successful sequencing reaction identified an average of 26 tags; given a 90% 
sequencing reaction success rate, this corresponded to an average of about 
850 tags per sequencing gel. 

SAGE data analysis 

Sequence files were analyzed by means of the SAGE program group 
(Velculescu et al., 1995), which identifies the anchoring enzyme site with the 
proper spacing and extracts the two intervening tags and records them in a 
database. The 68,691 tags obtained contained 62,965 tags from unique 
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ditags and 5,726 tags from repeated ditags. The latter were counted only 
once to eliminate potential PGR bias of the quantitation, as described 
(Velculescu et al., 1995). Of 62,965 tags, 2,332 tags corresponded to linker 
sequences, and were excluded from further analysis. Of the remaining tags, 
4,342 tags could not be assigned, and were likely due to sequencing errors (in 
the tags or in the yeast genomic sequence). If all of these were due to tag 
sequendng errors, this corresponds to a sequencing error rate of about 0.7% 
per base pair (for a lObp tag), not far from what we would have expected 
under our automated sequencing conditions. However, some unassigned tags 
had a much higher than expected frequency of A's as the last five base pairs 
of the tag (5 of the 52 most abundant unassigned tags), suggesting that these 
tags wCTe derived from transcripts containing anchoring enzyme sites within 
several base pairs from their polyA tails. Given the frequency of NlalQ sites 
in the genome (one in 309 base pairs), approximately 3% of transcripts were 
predicted to contain Nlain sites within 10 bp of their polyA tails. 

As very sparse data are available for yeast mRNA sequences and efforts 
to date have not been able to identify a highly conserved polyadenylation 
signal (Imiger and Braus, 1994; Zaret and Sherman, 1982), we used 14 bp of 
SAGE tags (i.e. the Nlalll site plus the adjacent 10 bp) to search the yeast 
genome directly O^east genome sequence obtained from the Stanford yeast 
genome ftp site (genome-ftp.stanford.edu) on August 7, 1996). Because only 
coding regions are annotated in the yeast genome, and SAGE tags can be 
derived from 3' untranslated regions of genes, a SAGE tag was considered to 
correspond to a particular gene if it matched the ORF or the region 500 bp 
3' of the ORF (locus names, gene names and ORF chromosomal coordinates 
were obtained from Stanford yeast genome ftp site, and ORF descriptions 
wa-e obtained from MIPS www ate (http://www.mips.biochem. mpg.de/) on 
August 14, 1996). ORFs were considered genes woth known functions if they 
were associated with a three letter gene name, while ORFs without such 
designations were considered uncharacterized. 

As expected, SAGE tags matched transcribed portions of the genome 
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in a highly non-random fashion, with 88% matching ORFs or their adjacent 
3' regions in the correct orientation (chi-squared P value <10"^. In instances 
when more than one tag matdied a particular ORF in the correct orientation, 
the abundance was calculated to be the sum of the matched tags (for Figure 

5 2, Figure 3, and Figure 4). Tags that matched ORFs in the incorrect 

orientation were not used in abundance calculations. In instances when a tag 
matched more than one region of the genome (for example an ORF and non- 
ORF re^on) only the matched ORF was considered. In some cases the 15th 
base of the tag could also be used to resolve ambiguities. For Figure 4, only 

10 tags that matched the genome once were used. 

For the identification of NORF genes, only tags were considered that 
matched portions of the genome that were further than 500 bp 3* of a 
previously identified ORF, and were observed at least two times in the SAGE 
libraries. 



20 



wo 98/32847 



PCTAJS98/01216 



m c < 

iiS) i 

Wo i5 



E 





C 

1 



•c 
B 

1 



O ^ ■© O 0) CD 

2 ® 2 "g e 2 

OL c Q. ^ a, Q. 
g _ ro _ 

E B E c E E 

o = o c o o 

•c E c c -c c 




i!SS2St^^!^SS$S£2S2£SSS^^(9U)coco(>ioo>o>Gicgcoh.h..h>r>io 



(0 

c 

O) 

"D 
O 

C/) 

(A 

0) 




- -?? s 

§ a ;^ 

I ? > 



<S5 




O oj O ^ 5 < V i^: 
■ • ^. D. £ o u. ^ Q. 

a 

q: 



q: o 
< < 



CO 



: w — 



Q. 

q: 



CO 

Q. < 

OC 2 



¥mi o 




O H )- »- o o < 

I- o ^- o o K CD 

O O tD ' ' 
^ H O O - 

O O < < O H: 

o < u o 

o o e) ^ 

d < o g o ^ S 

< >- O f- C) < o 




<<[-<o<ookW 



O w o 

o o o 
< t~ »- 



u z < 
' o 



o 



U O o i- < < 

I- o »- o o ^ 

K O < K U O 

o o o F o 

{D o h ^ o 

< < o < o < 



CD 



$ < o o 



^; ^ 



. . O O h- , 
< O CD o o o 



21 



wo 98/32847 



PCTAJS98/01216 




ieo cDT-oo^co(o^oooi^a> 



^ ^ T— o o o o 



00 00 00 00 GO GO OO 



mi 
M 

mm 



2 q: 
Q 5 O 





< O CD O O 

o < 
< < 

O O O 

V- O H- 



O h- I- H I- < < 

o < P o < < < 

- O (T) I- O P 

- O g (!) O h 
o < 



22 



wo 98/32847 



PCTAJS98/01216 



0)c0'»-a>^^ooo>coo2 



, V- (T> ^ , 

: CO o) J<} 



^ (£> r- rr% ^ ^ ^ r\l 7Z 



in CO O 



lO . 

! ^ CO CO ' 



*^ ^— t— ^ T~ T~ y ^ 



PTr(O<DlftCNI^Oc000r-I^CD(DCDl0Tr'*C0C0C0C0C0C0C0C0(0C>IC>4C>JCNI 



0) 

c 

a> 

U- 

O 



CO 

a> 

OS 



§000000000000000000000000000000 




23 



wo 98/32847 



PCT/US98/01216 



TABLE 4 



Additional NORFs 



GGCGCAATTT 

TAAGTGATGA 

TTGTTGAATT 

GAAGCAGTAA 

ACATATGTTA 

CCCTACACGG 

GTAATTGGAC 

ATCAGACAAA 

TTATGAAAGA 

ATTCGTTCTA 

AGCAGGAGTT 

TTCTATTAGG 

TGGATTTCAG 

CAGATATAAT 

CTGTTTTGGG 

CATTTTTAGT 

TTGAAAAGAT 

TAAGCCCATC 

AGCGTCCTCA 

TTTAGTTAAT 

ATGGTAGCCA 

AATTAGACTA 

AGTGACTCTT 

GGACTATAAG 

ACTTTTTCAG 

GTCATATAGT 

CAACAAAGTG 

GTGGGAAAGG 

TACTTTATAT 

AATACCAGCG 

GCCTTGTATA 

GGTACATTCA 

GATTTCTCTG 

TAGTTGCTCC 

GTAAGAAATC 

CTTGGGCTAT 

AAATGGTGAT 

ATCATTTGGG 

CTGAACTTTA 

CCAGAAGGAG 

CCGGTTACTA 

CGATGAGAAG 

AAACCGTCCC 

TCATTCATAC 

TATCTTTTTG 

TTAGAATAAT 

GTACGCTGTG 

TATATTAATT 



4 1108395 2 

7 593382 2 

10 608373 2 

3 155607 2 

4 916112 2 
6 223289 2 

10 392099 2 

14 687272 2 

15 81263 2 

15 841970 2 

16 188350 2 
2 418749 2 

4 1224930 2 

5 52488 2 

11 374761 2 
11 508212 2 
13 104160 2 
13 251273 2 

15 832420 2 

2 477623 2 

3 56961 2 

3 162589 2 

4 1490879 2 

5 251266 2 

10 159213 2 
13 158765 2 
13 171166 2 
13 804600 2 

16 366449 2 

3 175540 

4 372624 

5 67152 
5 187462 
7 317108 

7 836202 

8 107992 

11 558686 

12 199358 

12 283720 

13 652873 
15 803663 

15 1004369 

16 199141 
2 164728 
4 169784 

4 603508 

5 118089 

6 64228 
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GTTCTTGCCT 

ATATAGCTGC 

CCAAAAAAAA 

GAACTCCACA 

CCTTCACTGC 

CACATCATAA 

GAAGTATTGA 

TGCGCGTATA 

GGGTAGTACT 

TAGTTTTGTC 

CAATTCCTAC 

TTTGATTTGA 

GGCTCTGGTT 

CAGAAATAGC 

CTGTTATTTT 

CGAAGTCAAA 

CTCTAGATAA 

AGTCAAAATG 

GCGAGTTTAG 

GCTCCAATAG 

TTTATTTGAG 

GTTATATTGA 

TGGGTTGAAG 

ATTTTATTTG 

ATGATAAAAA 

TTATATAAAA 

CTACTTCTGC 

ATAAGACAGT 

TTCATAAGTT 

TAAATCTGAG 

CTGGTAGAAA 

CACGTACACA 

CCAAGATCAA 

AGCTTGTTCC 

CACATTCGTT 

CTTACATATA 

TCTATAGCAA 

CCTTTCTGAA 

CCTTTAGAAT 

AATTAACACC 

GCGCAGGGGC 

TGTTTATAAA 

AAAAGTCATT 

TTCGTAAACT 

TTTTTGGAGT 

AGGCATCTTG 

AAATCAAAAC 

AATTGACGAA 

TTGATGATTT 

CCTGTTTTTG 

TTTTTAAAAA 



7 939579 1 

10 181144 1 

11 91785 1 
11 94125 1 
11 374172 1 

11 625896 1 

12 603999 1 

13 206410 1 
13 671730 1 
15 33475 1 

1 172182 0.8 

2 46431 0.8 
2 414510 0.8 
2 565130 0.8 
2 616054 0.8 

2 680605 0.8 

3 171584 0.8 < 

4 192750 0.8 
4 691301 0.8 
4 1131020 0.8 
4 1237501 0.8 

4 1401803 0.8 

5 251266 0.8 
5 447729 0.8 

5 548612 0.8 

6 223182 0.8 
8 34653 0.8 
10 227802 0.8 

10 471894 0.8 

11 145617 0.8 
11 151174 0.8 
11 403208 0.8 

11 425882 0.8 

12 234966 0.8 
12 759953 0.8 

12 789781 0.8 

13 228936 0.8 
13 297985 0.8 
13 777999 0.8 

13 842122 0.8 

14 440984 0.8 

14 661710 0.8 

15 32081 0.8 
15 680625 0.8 

15 888343 0.8 

16 250284 0.8 
16 453890 0.8 
16 560169 0.8 
16 582360 0.8 
16 643476 0.8 

1 101436 0.5 
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AAGTTTGATC 


1 


199848 


0.5 


AGCACCTATG 


2 


46913 


0.5 


TGATTTATCC 


2 


418946 


0.5 


ACTGCATCTG 


2 


680860 


0.5 


CAAGTTAGGA 


2 


744770 


0.5 


ATACCCAATT 


3 


29939 


0.5 


AACTTTGTAT 


3 


30056 


0.5 


GCGGCGGGTG 


3 


41646 


0.5 


AAAATTGTTC 


3 


57108 


0.5 


TCAAGTACTC 


3 


157855 


6.5 


AACTGTATGC 


3 


223882 


0.5 


CTATCGGCCA 


3 


278840 


0.5 


ACAAGCCCAA 


3 


289917 


0.5 


GTACAGGGCT 


4 


93873 


0.5 


AAGATCATCG 


4 


254851 


0.5 


GAACTCCTGG 


4 


340891 


0.5 


GAACGAGAAG 


4 


371850 


0,5 


nil l AATAC 


4 ' 


372058 


0.5 


TCTCCAGTTG 


4 


381712 


0.5 


AATACGTTAC 


4 


471791 


0.5 


ACGATTGGCT 


4 


509158 


0.5 


TGTTTATAAG 


4 


521709 


0.5 


CGI 1 I IGGTC 


4 


538839 


0.5 


TCGAACCTCT 


4 


578702 


0.5 


TCCACACACA 


4 


930972 


0.5 


CCGTGCGTGC 


4 


1324367 


0.5 


TTTCTTCAAC 

III 1 1 


5 


116099 


0.5 


CCAAGTCTCG 


5 


159320 


0.5 


AGAGCGAATT 


5 


207517 


0.5 


TGTAGATTAT 


5 


280465 


0.5 


AAAAGTAGTT 


5 


286387 


0.5 


ACTTGGTATG 


5 


422942 


0.5 


TTAATGTTAT 


5 


544523 


0.5 


TACACGCGCG 


5 


544555 


0.5 


GGTCACTCCT 


6 


62983 


0.5 


AAGTGATGAA 


6 


76141 


0.5 


TTTATCTTGT 


6 


130327 


0.5 


AGTGATTGTT 


6 


256223 


0.5 


GCTTTGTTGT 


7 


72577 


0.5 


TCATTGATTC 


7 


110590 


0.5 


TTCACCGGAA 


7 


323655 


0.5 


ACTATTCTGT 


7 


423957 


0.5 


GGGCCAACCC 


7 


433787 


0.5 


AAAATATCTT 


7 


559397 


0.5 


TAGTAGTAAC 


7 


622201 


0.5 


AAGCGCACAA 


7 


735909 


0.5 


TCGCTGTl TV 


7 


800300 


0.5 


TGTA 1 1 1 II G 


7 


836202 


0.5 


CTAAACAAAG 


7 


836587 


0.5 


TAGGAAGAAA 


7 


905046 


0.5 


GGAAAAATTA 


7 


958839 


0.5 
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TTTGGA 1 AO 1 


7 

r 


974754 


0.5 


CGTTTGl 1 A 


s 


202655 


0.5 


AoAAAAAMMo 


8 


386651 


0.5 


TA A A/^TOPAr^ 


8 


518998 


0.5 




8 


529129 


0.5 


AT/^AriPATTT 


9 


97114 


0.5 


A/^f^TPiPAAAA 


g 


229077 


0.5 


TA APAAAftlA^ 


10 


628227 


0.5 


PA ATTr^^^PA A 
LfAA 1 1 oIjOMM 


10 


721781 


0.5 


A PTPP PTriT A 
AU 1 1 o 1 M 


11 


93528 


0.5 


r»Tr» T A TT/^ A T 
L» 1 U 1 A 1 1 o A 1 


11 


144281 


0.5 


OLr 1 1 1 1 1 1 


11 


146665 


0.5 


A /^r^r^/^ AAA ^ A 

ACCGCAAAoA 


11 


231872 


0.5 


CM G 1 1 UAAA 


12 


230972 


0.5 


AATG T GU 1 V3 1 


12 


320426 


0,5 


GCAGATAGOG 


12 


341324 


0.5 


TCTGACT 1 AG 


12 


368780 


0.5 


CCCGGAIGI 1 


12 


433912 


0,5 


A A^/^AXX^ 

GTAACGAl lo 


12 


449917 


0.5 


A A T A A A A 

GAATAACGAA . 


12 


673851 


0.5 


ACTGCTAI 1 1 


12 


712476 


0.5 


/•^XX/^X/^X A P P 

Gl 1 U 1 U 1 AoU 


12 


712712 


0.5 


A A A XO 

CATCACOA I U 


12 


794710 


0,5 


TTGCACTTC 1 


12 


806833 


0.5 


A /^T/^TTX AX/^ 

ACTGl 1 1 A 1 G 


12 


867350 


0.5 


'V 'r'/^r»XAXAXA 

IT Gu 1 A 1 A 1 A 


12 


1017911 


0.5 


XAi^AXXPXA A 


13 


95707 


0.5 


CTCT 1 AG 1 1 G 


1*^ 


158970 


0.5 


A /^/^ A A ^ A/^XX 

ACGAALrAGI 1 


13 


278341 


. 0.5 


A A ^XP 

TGCGCAAG 1 O 


1*^ 


283795 


0.5 


1 1 M 1 U t I AM 


13 
1 «j 


363037 


0.5 


^A A AX/^^AXX 

CAAAIGUAI 1 


13 


390802 


0.5 


P A A A XX/^XPX 

CAAA 1 1 G 1 G 1 


13 


395599 


0.5 


^^A ATA^XAX 

GCAATACTAl 


13 


826521 


0.5 


A /^T/^ A ^#*^ A X^ 

AGTGACGA i G 


1 


60143 


0.5 


TAClGGi 1 lA 


lit 


118854 


0.5 


GTTTGAuo 1 A 


Id 


335512 


0.5 


A /~*/^ /"» III AX 

AGCGTI IGAI 


Id 


478481 


0.5 


CTCTG 1 1 GUG 


1A 


728251 


0.5 


AAA I' r/^ A A A A 

AAATTCAAAA 


IS 


35952 


0.5 


I 1 1 GU 1 1 Go 1 


15 


242742 


0.5 


AGTI 1 1 GG 1 G 


1<^ 


304813 


0.5 


I I T AAA AX A 

TTTAAAGA 1 A 


1*i 


331453 


0.5 


A Al^PAPAPAP 

AAGGAoAOMo 


15 


448624 


0.5 


CTATATATCA 


15 


544530 


0.5 


GATGGAATAG 


15 


571210 


0.5 


TCGAGTGGAA 


15 


758202 


0.5 


AAAAAAGAAA 


15 


882567 


0.5 


TTTCCAGAAT 


15 


969884 


0.5 


TGGACAATGT 


15 


970607 


0.5 


GGAATTAAGA 


15 


979894 


0.5 
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ACTATATGTT 
GATATATCAT 
AGAATTGATT 
CACTGTCTCC 



16 
16 

16 
16 



582230 
589647 
744406 
824649 



0.5 
0.5 
0.5 
0.5 
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An isolated DNA molecule comprising a yeast gene which is involved 
in cell cycle progression selected from the group of NORF genes 
identified in Tables 3 and 4. 

The isolated DNA molecule of claim 1 wherein expression of 
the NORF gene varies by at least 10% between any two phases of the 
cell cycle selected from the group oonasting of: log phase, S phase, and 
G2/M. 

The isolated DNA molecule of claim 1 wherein expression of 
the NORF gene varies by at least 25% between any two phases of the 
cell cycle selected from the group consisting of: log phase, S phase, and 
G2M. 

The isolated DNA molecule of claim 1 wherein expression of 
the NORF gene varies by at least 50% between any two phases of the 
cell qrcle selected from the group consisting of: log phase, S phase, and 
G2/M. 

The isolated DNA molecule of claim 1 wherein ©cpression of 
the NORF gene varies by at least 100% between any two phases of the 
cdl cyde selected from the group consisting of log phase, S phase, and 
G2/M. 

The isolated DNA molecule of claim 1 wherein expression of 
the NORF gene varies by a statistically significant diflference (greater 
than 95% confidence level) between any two phases of the cell cycle 
selected from the group consisting of: log phase, S phase, and G2/M. 

The isolated DNA molecule of claim 6 wherein the NORF is 
selected from the group consisting of NORF N° 1, 2, 4, 5, 6, 17, 25, 
and 27. 

The isolated DNA molecule of claim 1 wherein the NORF gene 
is not pressed in at least one phase of the cell cycle selected from the 
group consisting of: log phase, S phase, and G2/M. 
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g The isolated DNA molecule of claim 1 which is genomic. 

10. The isolated DNA molecule of claim 1 which is cDNA. 

l\ A method of using yeast genes to affect the cell cycle, 

comprising the step of: 

adndnistering to a cell an isolated DNA molecule compri^g a 
yeast gene which is involved in cell cycle progression selected from the 
differentially expressed genes identified in Tables 1, 2, 3, and 4. 

12. The method of claim 1 1 wherein the cell is a yeast cell. 

13. The method of claim 1 1 wherein the cell is a fungal cell. 

14. The method of claim 1 1 wherein the cell is a mammalian cell. 

15. The method of claim 1 1 wherein the yeast gene is selected firom 
the group consisting of NORF N« 1, 2, 4, 5, 6, 17, 25, and 27. 

16. The method of claim 1 1 wherein the yeast gene is selected from 
the group consisting of: TEF1/TEF2, EN02, ADHl, ADH2, PGKl, 
CUPIA/CUPIB, and PYKl. 

1 7. The method of claim 1 1 wherdn the yeast gene is selected from 
the group consisting of: YKL056C, YMR116C, YEL033W, 
YOR182C, YCR0i3C, and YJR085C. 

18. A method for screemng candidate antifungal drugs, comprising 

the steps of: 

contacting a test substance with a yeast cell; 
monitoring expression of a yeast gene wWch is involved in cell 
cycle progression selected from the group of yeast genes identified in Tables 
1, 2, 3, and 4, wherein a test substance which modifies the expression of the 
yeast gene is a candidate antifungal drug. 

19. The method of claim 1 8 wherein the yeast gene is selected from 
the group consisting of NORF N° 1, 2, 4, 5, 6, 17, 25, and 27. 

20. The m^od of claim 1 8 wherein the yeast gene is selected from 
the group consisting of: TEF1/TEF2, EN02, ADHl, ADH2, PGKl, 
CUPIA/CUPIB, and PYKl. 

21 . The method of cimm 18 wh^ein the yeast gene is selected from 
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the group consisting of: YKL056C, YMR116C, YEL033W, 
YOR182C, YCR013C, and YJR085C. 

22. A method for identifying human genes which are involved in 
cell cycle progression, comprising the steps of: 

hybridisdng a probe comprising at least 10 contiguous 
nucleotides of a yeast gene which is differentially expressed between at least 
two phases selected from the group consisting of log phase, S phase, and 
G2/M phase, wherein the yeast gene is identified in Table 1, 2, 3, or 4. 

23. The method of claim 22 wherdn the yeast gene is selected firom 
the group consisting of NORF 1, 2, 4, 5, 6, 17, 25, and 27. 

24. The method of claim 22 v^erein the yeast gene is selected from 
the group consisting of: TEF1/TEF2, EN02, ADHl, ADH2, PGKl, 
CUPIA/CUPIB, and PYKl. 

25. The method of daim 22 wherein the yeast gene is selected from 
the group consisting of: YKL056C, YMR116C, YEL033W, 
YOR182C, YCR013C, and YJR085C. 

26. A probe for ascertaining phase in the cell cycle of a cell, 
wherein the probe comprises at least 14 contiguous nucleotides of a 
NORF gene as identified in Table 3 or 4. 

27. The probe of claim 26 wherein expression of the NORF gene 
varies by at least 10% between any two phases of the cell cycle selected 
from the group consisting of: log phase, S phase, and G2/M. 

28. The probe of claim 26 wherein expression of the NORF gene 
varies by at least 25% between any two phases of the cell cycle selected 
from the group consisting of log phase, S phase, and G2/M. 

29. The probe of claim 26 wherein expression of the NORF gene 
varies by at least 50% between any two phases of the cell cycle selected 
from the group consisting of: log phase, S phase, and G2/M. 

30. The probe of cldm 26 wherein expression of the NORF gene 
varies by at least 100% between any two phases of the cell cycle 
selected from the group consisting of: log phase, S phase, and G2/M. 
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3 1 . The probe of claim 26 wherein the NORF gene is not expressed 
in at least one phase of the cell cycle selected from the group consisting 
of: log phase, S phase, and G2/M. 

32. The probe of claim 26 wherein expression of the NORF gene 
5 varies by a statistically significant difference (greater than 95% 

confidence level) between any two phases of the cell cycle selected 
fi-om the group con^sting of: log phase, S phase, and G2/M. 

33. The probe of claim 32 wherein the gene is selected from the 
group consisting of NORF N« 1, 2, 4, 5, 6, 17, 25, and 27. 

10 34. The method of claim 18 wherein said step of momtoring 

expression is performed using nucleic acid molecules which are 
immobilized on a solid support. 
35. The method of claim 34 wherein the nucleic acid molecules are 

in on array. 

15 36. The method of claim 19 wherein a probe which comprises a 

portion of said yeast gene is in an array on a solid support. 
37. An array of probes on a solid support wherein at least one probe 

comprises at least 14 contiguous nucleotides of a NORF gene as 

identified m Table 3 or 4. 
20 38. The array of claim 37 wherein the NORF gene is selected from 

the group consisting of NORF No. 1 2, 4, 5, 6, 17, 25, and 27. 

39. The array of claim 37 which comprises at least 100 probes of 
distinct sequence . 

40. The array of claim 37 which comprises at least 500 probes of 
25 distinct sequence. 

41. The array of claim 37 which comprises at least 1,000 probes 
of distinct sequence. 
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